27 research outputs found

    Modeling Routing Choices in Unidirectional Pedestrian Flows

    Get PDF
    In this work we present a simple routing model capable of capturing pedestrians path choices in the presence of a herding effect. The model is tested and validated against data from a large scale tracking campaign which we have conducted during the GLOW 2019 festival. The choice between alternative paths is modeled as an individual cost minimization procedure, with the cost function being associated to the (estimated) traveling time. In order to trigger herding effects the cost function is supplemented with a penalty term, modulated as a function of the fraction of pedestrians walking along each route. The model is shown to provide an accurate quantitative description of the decision process

    Early Experience on Using Knights Landing Processors for Lattice Boltzmann Applications

    Full text link
    The Knights Landing (KNL) is the codename for the latest generation of Intel processors based on Intel Many Integrated Core (MIC) architecture. It relies on massive thread and data parallelism, and fast on-chip memory. This processor operates in standalone mode, booting an off-the-shelf Linux operating system. The KNL peak performance is very high - approximately 3 Tflops in double precision and 6 Tflops in single precision - but sustained performance depends critically on how well all parallel features of the processor are exploited by real-life applications. We assess the performance of this processor for Lattice Boltzmann codes, widely used in computational fluid-dynamics. In our OpenMP code we consider several memory data-layouts that meet the conflicting computing requirements of distinct parts of the application, and sustain a large fraction of peak performance. We make some performance comparisons with other processors and accelerators, and also discuss the impact of the various memory layouts on energy efficiency

    Towards learning Lattice Boltzmann collision operators

    Get PDF
    In this work we explore the possibility of learning from data collision operators for the Lattice Boltzmann Method using a deep learning approach. We compare a hierarchy of designs of the neural network (NN) collision operator and evaluate the performance of the resulting LBM method in reproducing time dynamics of several canonical flows. In the current study, as a first attempt to address the learning problem, the data was generated by a single relaxation time BGK operator. We demonstrate that vanilla NN architecture has very limited accuracy. On the other hand, by embedding physical properties, such as conservation laws and symmetries, it is possible to dramatically increase the accuracy by several orders of magnitude and correctly reproduce the short and long time dynamics of standard fluid flows

    Optimization of lattice Boltzmann simulations on heterogeneous computers

    Get PDF
    High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach, in which hosts offload almost all compute-intensive sections of the code onto accelerators; this approach only marginally exploits the computational resources available on the host CPUs, limiting overall performances. The obvious step forward is to run compute-intensive kernels in a concurrent and balanced way on both hosts and accelerators. In this paper, we consider exactly this problem for a class of applications based on lattice Boltzmann methods, widely used in computational fluid dynamics. Our goal is to develop just one program, portable and able to run efficiently on several different combinations of hosts and accelerators. To reach this goal, we define common data layouts enabling the code to exploit the different parallel and vector options of the various accelerators efficiently, and matching the possibly different requirements of the compute-bound and memory-bound kernels of the application. We also define models and metrics that predict the best partitioning of workloads among host and accelerator, and the optimally achievable overall performance level. We test the performance of our codes and their scaling properties using, as testbeds, HPC clusters incorporating different accelerators: Intel Xeon Phi many-core processors, NVIDIA GPUs, and AMD GPUs

    Performance and portability of accelerated lattice Boltzmann applications with OpenACC

    Get PDF
    An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems have been troublesome in the past as accelerators could usually be programmed using specific programming languages threatening maintainability, portability, and correctness. Several new programming environments try to tackle this problem. Among them, OpenACC offers a high-level approach based on compiler directives to mark regions of existing C, C++, or Fortran codes to run on accelerators. This approach directly addresses code portability, leaving to compilers the support of each different accelerator, but one has to carefully assess the relative costs of portable approaches versus computing efficiency. In this paper, we address precisely this issue, using as a test-bench a massively parallel lattice Boltzmann algorithm. We first describe our multi-node implementation and optimization of the algorithm, using OpenACC and MPI. We then benchmark the code on a variety of processors, including traditional CPUs and GPUs, and make accurate performance comparisons with other GPU implementations of the same algorithm using CUDA and OpenCL. We also asses the performance impact associated with portable programming, and the actual portability and performance-portability of OpenACC-based applications across several state-of-the-art architectures

    High-statistics pedestrian dynamics on stairways and their probabilistic fundamental diagrams

    Full text link
    Staircases play an essential role in crowd dynamics, allowing pedestrians to flow across large multi-level public facilities such as transportation hubs, and office buildings. Achieving a robust understanding of pedestrian behavior in these facilities is a key societal necessity. What makes this an outstanding scientific challenge is the extreme randomness intrinsic to pedestrian behavior. Any quantitative understanding necessarily needs to be probabilistic, including average dynamics and fluctuations. In this work, we analyze data from an unprecedentedly high statistics year-long pedestrian tracking campaign, in which we anonymously collected millions of trajectories across a staircase within Eindhoven train station (NL). Made possible thanks to a state-of-the-art, faster than real-time, computer vision approach hinged on 3D depth imaging, and YOLOv7-based depth localization. We consider both free-stream conditions, i.e. pedestrians walking in undisturbed, and trafficked conditions, uni/bidirectional flows. We report the position vs density, considering the crowd as a 'compressible' physical medium. We show how pedestrians willingly opt to occupy fewer space than available, accepting a certain degree of compressibility. This is a non-trivial physical feature of pedestrian dynamics and we introduce a novel way to quantify this effect. As density increases, pedestrians strive to keep a minimum distance d = 0.6 m from the person in front of them. Finally, we establish first-of-kind fully resolved probabilistic fundamental diagrams, where we model the pedestrian walking velocity as a mixture of a slow and fast-paced component. Notably, averages and modes of velocity distribution turn out to be substantially different. Our results, including probabilistic parametrizations based on few variables, are key towards improved facility design and realistic simulation of pedestrians on staircases

    Massively parallel lattice–Boltzmann codes on large GPU clusters

    Get PDF
    This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bottlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and optimization methodology that can be used for the development of other high performance applications for computational physics

    Accelerating the D3Q19 Lattice Boltzmann Model with OpenACC and MPI

    No full text
    Multi-GPU implementations of the Lattice Boltzmann method are of practical interest as they allow the study of turbulent flows on large-scale simulations at high Reynolds numbers. Although programming GPUs, and in general power-efficient accelerators, typically guarantees high performances, the lack of portability in their low-level programming models implies significant efforts for maintainability and porting of applications. Directive-based models such as OpenACC look promising in tackling these aspects. In this work we will evaluate the performances of a Multi-GPU implementation of the Lattice Boltzmann method accelerated with OpenACC. The implementation will allow for multi-node simulations of fluid flows in complex geometries, also supporting heterogeneous clusters for which the load balancing problem is investigated
    corecore